NPS— S5Bw76061 



Ul»f «®Y 

TL - ’ - " : <- 

N A 7 " L i 

MON 1 £Hc Y. 



NAVAL POSTGRADUATE SCHOOL 

r 

Monterey, California 




ON RANDOM BINARY TREES 
by 

Gerald G . Brown 
and 

Bruno 0 . Shubert 
June 1976 

Technical Report for period -June 1975-June 1976 



FEDDOCS 
D 208.14/2: 
NPS-55BW76061 



Approved for public release; distribution unlimited 
Prepared for: 

Thief of Naval Research, Arlington, Virginia 22217 



NAVAL POSTGRADUATE SCHOOL 
Monterey, California 



Rear Admiral Isham Linder Jack R. Borsting 

Superintendent Provost 



The work reported herein was supported by the Foundation Research 
Program with funds provided by the Chief of Naval Research, Arlington, 
Virginia, during FY 76. 



Reproduction of all or part of this report is authorized. 



UNCLASSIFIED 



SECURITY CLASSIFICATION OF THIS PAGE /When Dete Entered) 



REPORT DOCUMENTATION PAGE 


READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


i. report number i2. govt accession no. 

NPS-55Bw7606l 


3. RECIPIENT’S CATALOG NUMBER 


4. TITLE (and Subtitle) 


5. TYPE OF REPORT A PERIOD COVERED 


ON RANDOM BINARY TREES 


Technical Report, 06/7 5—06 / 76 




6. PERFORMING ORG. REPORT NUMBER 


7. AUTnORfs; 


3. CONTRACT OR GRANT NUMBER^*; 


Gerald G. Brown and Bruno 0. Shubert 




9. performing organization name and aooress 


10. PROGRAM ELEMENT. PROJECT, TASK 
AREA A WORK UNIT NUMBERS 


.JJaval Postgraduate School 
Monterey, California 93940 


N0001476WR60052 


1 1. controlling office name and address 
Chief of Naval Research 


12. REPORT OATE 

June 19 76 


Arlington, Virginia 22217 


13. NUMBER OF PAGES 

49 


u. MONITORING AGENCY NAME & AOORESSr/f different from Controlling Office) 


15. SECURITY CLASS, (ot thta report ) 




UNCLASSIFIED 




15 OECL ASS! F| CATION/ DOWNGRADING 
SCHEDULE 



16. DISTRIBUTION STATEMENT (ot tflis Report) 



Approved for public release; distribution unlimited 



17. DISTRIBUTION STATEMENT (ot Che abstract entered in Block 20, if different from Report) 



18. SUPPLEMENTARY NOTES 



19. KEY WORDS (Continue on reverse aide if necessary end identify by block number) 

Binary Trees, Random Binary Trees, Insertion Tree Sorting, Binary Search Trees, 
Height of Random Trees, Asymptotic Random Trees, Counting Binary Trees, 
Recursive Enumeration of Trues, Vacancies in Binary Trees, Comparisons for 
Binary Insertion Sorting, Growing Binary Trees 



20. ABSTRACT ('Conf/nua on reverse side if necessary and Identify by block number) 

Binary Trees are examined comb ina tori ally with the view of providing infor- 
mation useful in analyzing algorithms based on this widely used storage struc- 
ture, Exact and asymptotic results are given for equally likely trees and 
those grown by binary insertion tree sorts applied to random strings of key 
symbols. An appendix is provided with tabulations of results. 



DO 1 J AN ^73 1473 EDITION OF 1 NOV 65 IS OBSOLETE 
S/N 0102-014- 660 1 | 



UNCLASSIFIED 



SECURITY CLASSIFICATION OF THIS PAGE Date Entered) 



1. INTRODUCTION 



A binary tree is a finite set of nodes, either empty or containing one node 
called a root, such that all other nodes are partitioned into disjoint sets which 
are respectively called left and right subtrees of the root. The subtrees also 
satisfy the definition of a binary tree. Thus, a binary tree is an unlabelled 
rooted arborescence with successors of at most degree two distinguished only as 
left and right. 

Figure 1.1 shows a binary tree with six nodes. The root node is shown at 
the top and is connected by arcs to two immediate successor nodes which are the 
roots of its left and right subtrees. Each node with no successors (for in- 
stance, the left subtree of the root) is called a leaf. The level of a node 
indicates how deep it is within the tree. Thus the root has level one, its 
immediate successor nodes have level two, and so forth down the occupied por- 
tions of the subtrees. The height of a binary tree is the largest occupied 
level. A full binary tree has no internal vacancies (unoccupied node positions). 
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FIGURE 1.1 - A Binary Tree with Six Nodes 

Binary trees are frequently used as information storage structures on 
digital computers. For instance, one of the most popular methods of randomly 
retrieving information by a key, or symbol, is to store the key data in a 
binary tree. To search for a particular symbol, we begin by looking at the 
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root and proceed by applying the following rules recursively: 

1. If the symbol matches the root symbol, the symbol is found. 

2. If the symbol is "less than” the root (according to some binary 
ordering relation) continue the search by considering the left 
successor of the root as the new root (of the left subtree). 

3. If the symbol is greater than the root, continue by searching the 
right subtree. 

4. If there is no root, the symbol is not in the binary tree. 

We assume for simplicity that all symbols are distinct with respect to the 
ordering relation. Otherwise, the ordering relation and search must be modi- 
fied in an obvious fashion. The construction of a binary tree for use by such 
a search scheme may be performed by sequentially examining the key symbols to 
be inserted. This binary tree sort is a one pass ordering procedure which 
proceeds : 

1. If there is no root, insert the symbol as the root. 

2. If the symbol is less than the root symbol, continue by considering 
the left subtree. 

3. If the symbol is greater than the root symbol, continue with the 
right subtree. 

As an example, consider the six symbols ABCDEF and a lexicographical binary 
ordering relation. Suppose that the particular permutation of symbols ex- 
amined is BDAFCE. The resulting binary tree is shown in Figure 1.2, and has 
structure identical to the tree in Figure 1.1. Note that this same tree may 
have resulted from other permutations of the same symbols, for instance 
BADCFE. Therefore, there is a many-to-one mapping of key symbol permutations 
to corresponding binary trees. 




E 

FIGURE 1.2 - A Binary Tree with Inserted Symbols 
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The height of the binary tree in Figure 1.2 is four and thus the maximum 
number of comparisons required to insert another symbol is four. Similarly, 
if this tree is used for retrieving symbols, the maximum search length for a 
symbol in the tree is four, and for a symbol not found the maximum is five. 

A computer implementation of a binary tree storage structure requires 
that each node be represented by its key symbol accompanied by sufficient 
additional information to identify and access the left and right subtrees. 

This is usually accomplished by use of a dense array of node symbols each 
with left and right pointers, by node storage via address calculation into an 
array space sufficient to store all possible binary trees with a given number 
of nodes and some maximum height, or by some similar method. 

In the following sections we study this widely used class of binary trees 
in order to provide information useful in examining algorithms based on this 
storage structure. A closed form counting formula for the number of binary 
trees with n nodes and height k is developed and restated as a recursion more 
useful computationally. A generating function for the number of nodes given 
height is developed and used to find the asymptotic distribution of binary 
trees. An asymptotic probability distribution for height given the number of 
nodes is derived based on equally likely binary trees. This is compared with 
a similar result for general trees. 

Random binary trees (those resulting from the binary tree sorting algo- 
rithm applied to random strings of symbols) are counted in terms of the mapping 
of permutations of n symbols to binary trees of height k. An explicit formula 
for this number is given with an equivalent recursive definition for computa- 
tional use. A generating function is derived for the number of symbols given 
height. Lower and upper bounds on random binary tree height are developed and 
shown to approach one another asymptotically as a function of n, providing a 
limiting expression for the expected height. 
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The random binary trees are examined further to provide expressions for 
the expectations of the number of vacancies at each level, the distribution of 
vacancies over all levels, the comparisons required for insertion of a new 
random symbol, the fraction of nodes occupied at a particular level, the number 
of leaves, the number of single vacancies at each level, and the number of twin 
vacancies at each level. A random process is defined for the number of symbols 
required to grow a tree exceeding any given height. 

Finally, an appendix is given with sample tabulations and figures of the 
distributions . 



2. NUMBER OF BINARY TREES OF A GIVEN HEIGHT. 

In this section we consider the problem of finding the number of binary 
trees with n nodes and height k. Denote this number by t(n,k) , where n 
and k are positive integers. Since t(n,k) = 0 unless 



k < n < 2 - 1 



(2 



we are only concerned with integers n = 1,2,...; k = 1,2,... satisfying the 
inequality (2.1) . 

An explicit formula for the numbers t(n,k) can be obtained by the fol- 
lowing simple combinatorial argument. Consider the class of all binary trees 
with n nodes and height k which have exactly nu nodes at the level j+1, 

i=l,...,k-l. Let m. = Z. + r. , where Z. and r. are numbers of nodes 

3 3 3 3 3 

which are left successors and right successors of nodes at the level j . In 

other words, L. is the number of nodes at level j+1 at the end of left 

J / m. \ 

going arcs emanating from nodes at level j. These can be selected in J 

/ m i-l\ \ l i / 

ways, and the r. nodes in J ] ways. Thus, the total number of 

j \ r j / 

ways to arrange the arcs between nodes at levels j and j+1 is given by 
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( 2 . 2 ) 




l.>0 r . >0 



L . +r . =m . 

3 3 3 



Since = 1 (the root) and +• 

we obtain from (2.2) the formula 



= n-1 with 



m. > 1 
3 ~ 



for 



j=l, . . . ,k 



t(n,k) 





(2.3) 



where the summation is over all integers m. 



j=l, . . . ,k 



satisfying 



and 



m j — 1 > j 1» • • • >k 1 , 
+• • •+ m^ ^ = n—1 • 



The formula is valid for n>l and k satisfying (2.1), for n=l we have 
trivially t(l,l) = 1. 

Although (2.3) is an explicit formula for the number t(n,k) it is not 
very convenient for calculation. An alternate way is through a recurrence. 

Let 

t(n,k) = T(n,k) - T(n,k-1) , n > 1, k > 1 (2.4) 



where T(n,k) is the number of binary trees with n nodes and height not 
exceeding k. If we define 



( 1 if n=0 

T(n,0) = { , (2.5) 

( 0 if n>0 



and T(0,k) = 1 for k 0 , we obtain the recurrence relation 



T (n+1 ,k+l) = y; T (j ,k)T (n-j ,k) , 

j=0 



( 2 . 6 ) 
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valid for n_>0 and k >0 . This follows from the fact that the class of all 
binary trees with n+1 nodes and height not exceeding k+1 can be partitioned 
into n+1 subclasses according to the number of nodes j in the left subtree 
of root* Since the heights of both the left and right subtrees must not exceed 
k the number of trees in the j-th class is the product T(j ,k)T(n-j ,k) , and 
(2.6) follows. 

Note that with the convention (2.5) the recurrence (2.6) yields automatical 
T(n,k) =0 for n > 2 k - 1 , 



and that for 0 <_ n <_ k , T(n,k) is just the number of binary trees with 
nodes. It is well known (see [Knuth , vol.3]) that the latter are Catalan 
numbers 



C « 
n 




n > 0 



n 



(2.7: 



so that 



T(n,k) = C for 0 < n < k. 

n — — 



(2*s: 



From the recurrence (2.6) one easily obtains the sequence enumerators 
defined by 

f k (x) = ^ : T(n,k)x n , k >_ 0 . (2.9) 

n>0 

Since the right-hand side of (2.6) is the Cauchy product, we have immediately 

T (n+1, k+1) x 11 = f k (x) , 

n>0 

from which in view of (2.5) we obtain 

2 \ 

f k +i( x ) = 1 + x f k (x) , k >_ 0 , I 



with 



f 0 (x) = 1 . 



(2.H 



Note that f (x) is a polynomial in x , hence if x is regarded as a complex 

1C 

variable f, , k m 0,l, ...» is a sequence of entire functions. 

We now show that this sequence converges uniformly in a circular region 

/ 

of the complex plane, specifically that as k ->• » 



f^(z) -> oj( z) uniformly for |z| <_ , 



( 2 . 11 ) 



where 



, s \ ' 1 / 2n \ n 1 - / l-4z 

“ (2) - Lj n+T \ n ) Z 2z 



( 2 . 12 ) 



n=0 



To see this, note that with C as in (2.7) we have T(n,k) < C for all k 
’ n — n 



which together with (2.8) yields 



oj(z) - f k (z) = z 



^ (C n -T(n,k))z n k , 



n>k 



so that for 



z l i j 



M - f t (s) [ < Y, C J*| n i£ c n 4 '” • 



n>k 



n>k 



which is a tail of the expansion 




2 . 



This result will now be used to develop an asymptotic distribution of 
the numbers t(n,k) as k . In doing this, we follow the method of 

Renyi and Szekeres used in [8] for a similar problem. 

From (2.4) we have for k >_ 1, t(0,k) = 0 , 

^^t(n,k)z n = f k ( z) -- f k _ 1 (z) , (2.13) 

n>0 



which are entire functions for every k >_ 1. Hence by the Cauchy formula 



t(n,k) 



1 

2iTi 



- £ k-i (z) 
n+1 



dz 



(2.14) 
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as the contour of integration. To estimate 



where we take the circle | z | = 
the integral we use Laplace’s method by first showing that as k -* 00 the only 
significant contribution of the integrand is in the vicinity of positive real 
axis . 

Let F(<f>,z) = 1 + e^z , -tt <<}><_ ir (2.15 

and let for <j> fixed 



F k+ i(<j),z) = F(*,F k (<fr,z)) , k > 1 , 
F 1 (<j»,z) = F(<f>,z) . 



Then 



f k ( \ ^ = F k (<K1) ’ k > 1 , 



(2.16 



i.e. f, are iterates of the function F . For z = -y e ^ the function 
k 4 

co(z) defined by (2.12) satisfies 



F ( <p , to) = co , 



(2.17 



and is given by 




(2. IS 



where Rev l-e^ >_ 0 . The curve toy^ , -tt < <J> <_ ir is therefore the 

curve of fixed points of the function F(<j>,*)» For any particular <p the 
derivative 



^ F(*,z) = F'(*,z) = j e^z 



so that 



for z = co we have 

|f'(*,u)| = \\ e L *w| = |l - / l-e i4) | . (2.19 



The locus of points /l-e 1 ^ , -tt < (j) <_ ir in the complex plane is an 
Lemniscate of Bernoulli with polar equation = 2 cos 9, while 1 - J 1-e^ 
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just shifts the curve one unit to the right. (See Figure 2.1) 




FIGURE 2.1 



Hence 



I F ' C4> ,cu) | 1 1 , 



( 2 . 20 ) 



with equality if and only if 6=0. Thus, if all fixed points (2.13) 

are atractive and since f^ are iterates of the function F converging to 
oj this implies that as k 00 







i$ 



) + 0 




( 2 . 21 ) 



uniformly for | $ | >_ e > 0 . 

The modulus |l - J 1-e^ | is a decreasing function of \y\ in 

0 <_ | (p | <_ tt with maximum 1 at <p = 0. Further denoting u^((p) + iv^(<£) = 

= 1 - / 1 -e 10 the real and imaginary parts have asymptotic expansions 

\3 



u i (w = 1 -/T - x 



i 



+ 0 



v x ($) 



■ i/T - /T 



+ o 



(* 5/2 ) , 
(* 5/2 ) ■ 



( 2 . 22 ) 



as | <j> I -*■ 0 . Hence for sufficiently small | $ | 

i i ( £n 2 k \ 2 

and thus if we choose for instance | <j> | <_ ( — — — 1 



1 - / l-e 1? | < 1 - 3 / |<j >| , 



we obtain from ( 2 . 21 ) 



9 



as Ic -* 






(2.23) 



Remark: The same result can also be obtained more directly by writing 



f k+ l(z) - co(z) = 1 + z f k (z) - (1 - z a) 2 (z) ) 





= z[f k (z) - 


u(z) ] C f k (z) + “(z) ] . 






Then choose e 


> 0 such that 


j oj e"^) | >_ e for all <p 


and choose 


K 


such that k >_ 


K =H £ k(i eM> ) 


- e^) | < e . Then with 


1 id> 
z= 4 e 


for 


k >_ K 


l £ k+i - “1 i 


j|f k “ “1 (2|w| + e) 






wnence for all 


m > 1 










| f . , . “ 0) | 

1 k-Hm 1 — 


|f R - oj | ( j 1 - / l-e 1 ^ j + £) m . 







Hence (2.23) follows by the same argument as before. 

1 id> 

If we now substitute (2.23) into (2.14) we have with 2 = e 






-in<j> 



d<f> 



-TT 



(W 



e- &2k ) , 



> < 



as k » . 

To estimate the remaining integral we first set <j> = 0 and call 



“k ■ f i 



(i) 



The recurrence (2,10) yields 



\+i = i + ^ “ 2 



4 “k ’ a 0 1 



(2.24) 



(2.25) 



(2.26) 
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